Approximate Solution Techniques for Factored First-Order MDPs

نویسندگان

  • Scott Sanner
  • Craig Boutilier
چکیده

Most traditional approaches to probabilistic planning in relationally specified MDPs rely on grounding the problem w.r.t. specific domain instantiations, thereby incurring a combinatorial blowup in the representation. An alternative approach is to lift a relational MDP to a firstorder MDP (FOMDP) specification and develop solution approaches that avoid grounding. Unfortunately, state-of-the-art FOMDPs are inadequate for specifying factored transition models or additive rewards that scale with the domain size—structure that is very natural in probabilistic planning problems. To remedy these deficiencies, we propose an extension of the FOMDP formalism known as a factored FOMDP and present generalizations of symbolic dynamic programming and linear-value approximation solutions to exploit its structure. Along the way, we also make contributions to the field of first-order probabilistic inference (FOPI) by demonstrating novel first-order structures that can be exploited without domain grounding. We present empirical results to demonstrate that we can obtain solutions whose complexity scales polynomially in the logarithm of the domain size—results that are impossible to obtain with any previously proposed solution method. Introduction There has been a great deal of research in recent years aimed at exploiting structure in order to compactly represent and efficiently solve decision-theoretic planning problems in the Markov decision process (MDP) framework (Boutilier, Dean, & Hanks 1999). While traditional approaches to solving MDPs typically used an enumerated state and action model, this approach has proven impractical for large-scale AI planning tasks where the number of distinct states in a model can easily exceed the limits of primary and secondary storage on modern computers. Fortunately, many MDPs can be compactly described in a propositionally factored model that exploits various independences in the reward and transition functions. And not only can this independence be exploited in the problem representation, it can often be exploited in exact and approximate solution methods as well (Hoey et al. 1999; Copyright c © 2007, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. St-Aubin, Hoey, & Boutilier 2000; Guestrin et al. 2002). However, while recent techniques for factored MDPs have proven effective, they cannot generally exploit first-order structure. Yet many realistic planning domains are best represented in first-order terms, exploiting the existence of domain objects, relations over these objects, and the ability to express objectives and action effects using quantification. These deficiencies have motivated the development of the first-order MDP (FOMDP) framework (Boutilier, Reiter, & Price 2001) that directly exploits the first-order representation of MDPs to obtain domain-independent solutions. While FOMDP approaches have demonstrated much promise (Kersting, van Otterlo, & de Raedt 2004; Karabaev & Skvortsova 2005; Sanner & Boutilier 2005; 2006), current formalisms are inadequate for specifying both factored actions and additive rewards in a fashion that allows reasonable scaling with the domain size. To remedy these deficiencies, we propose a novel extension of the FOMDP formalism known as a factored FOMDP. This representation introduces product and sum aggregator extensions to the FOMDP formalism that permit the specification of factored transition and additive reward models that scale with the domain size. We then generalize symbolic dynamic programming and linear-value approximation techniques to exploit product and sum aggregator structure, solving a number of novel problems in first-order probabilistic inference (FOPI) in the process. Having done this, we present empirical results to demonstrate that we can obtain effective solutions on the well-studied SYSADMIN problem whose complexity scales polynomially in the logarithm of the domain size—results that are impossible to obtain with any previously proposed solution method. Markov Decision Processes Factored Representation In a factored MDP, states will be represented by vectors ~x of length n, where for simplicity we assume the state variables x1, . . . , xn have domain {0, 1}; hence the total number of states is N = 2. We also assume a set of actions A = {a1, . . . , an}. An MDP is defined by: (1) a state transition model P (~x|~x, a) which specifies the probability of the next state ~x given the current state ~x and action a; (2) a reward function R(~x, a) which specifies the immediate reward

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Solution Algorithms for Factored MDPs

This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MD...

متن کامل

Max-norm Projections for Factored MDPs

Markov Decision Processes (MDPs) provide a coherent mathematical framework for planning under uncertainty. However, exact MDP solution algorithms require the manipulation of a value function, which specifies a value for each state in the system. Most real-world MDPs are too large for such a representation to be feasible, preventing the use of exact MDP algorithms. Various approximate solution a...

متن کامل

Exploiting Anonymity in Approximate Linear Programming: Scaling to Large Multiagent MDPs (Extended Version)

Many exact and approximate solution methods for Markov Decision Processes (MDPs) attempt to exploit structure in the problem and are based on factorization of the value function. Especially multiagent settings, however, are known to suffer from an exponential increase in value component sizes as interactions become denser, meaning that approximation architectures are restricted in the problem s...

متن کامل

Fast Approximate Hierarchical Solution of MDPs

In this thesis, we present an efficient algorithm for creating and solving hierarchical models of large Markov decision processes (MDPs). As the size of the MDP increases, finding an exact solution becomes intractable, so we expect only to find an approximate solution. We also assume that the hierarchies we create are not necessarily applicable to more than one problem so that we must be able t...

متن کامل

Linear Program Approximations for Factored Continuous-State Markov Decision Processes

Approximate linear programming (ALP) has emerged recently as one of the most promising methods for solving complex factored MDPs with finite state spaces. In this work we show that ALP solutions are not limited only to MDPs with finite state spaces, but that they can also be applied successfully to factored continuous-state MDPs (CMDPs). We show how one can build an ALP-based approximation for ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007